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ABSTRACT 

This panel's goal was to improt^e the validity and 
utility of measurement, design, and analysis in research on teaching 
through th^ stimulation Qf new methodological knowledge and through 
the identification and translation of useful existing knowledge . from 

-.other descriptions* Thi^ panel tried to identify as many 
methodological probiems as possible which limit the productivity of 
research on teaching, and then adopted ^f our "approaches" which it 
believed to be solutions that encompass all the me-thodological 
problems of research bn teaching. These four approaches were (1) to 
develop and test new analysis and design strategies appropriate for 
research on teaching; (2) to increase understanding of existing 
measure'ment strategies, for research^ on teaching and, where 
appropriate, develop .new measurement strategies; (3) to identify, 
demonstrate, and disseminate methodologies from other research 

— d^^iplines which appear to have merit for' research on teaching; and 
(4) to consider the utility 'of stan^dards for improving methodological 
practice .in research on teaching. (BD) 
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PREFACE 



♦ ^ 



' The volume before you is the report of one of ten panels that* parti- 
cipated in a five-day conference in l/ashington during the su/nmer of 1974. 
The primary objective of this Conference was to provide an agenda for 
further research and dei^velopment to guide the Institute in its planning 
^ and funding over the next^everal years. Both by the involvement of some 
100 respected practitioners, administrators, and researchers as panelists, 
and by the public debate and criticism of the panel report's, the Institute 
aims to create a major role for the practitioner^ and research communities 
in determining the direction of government funding. 

The Conference i.tSelf is seen as only an event in the middle of the 
process. In many months of preparation for the- Conference, the staf^f met 
with a number of^.groupsT-students, teachers , administrators-, -etc.— to - 
develop coherent problem statements which* served as a charge to the panel- 
ists. Panel chairmen and otheVs met both before and after the Conference. 
Several other panelists were commissioned to pull together the major 
themes and recommendations tha^t kept recurring in different panels (being 
reported-in a -separate Conference Summary Report). Reports are being 
(Jistributed to practitioner and r'esearch comnunities. The Institute 
encourages other interest groups to debate and critique relevant panel 
reports from their, own -perspectives. 

The Conference rationale stems from the frank acknowledament that^ 
much of the funding for.educationaT research and development^projects 
has not been coordinated and sequenced in such a way as to avoid undue 
duplicate "yet fill, significant gaps, or in such a way>as to build a 
cumulative mpact relevant 'to educational* practice. Nor have an agency's 
affected constituencies ordinarily had the oppprtunity for public dis-, 
cussion^of funding alternatives and proposed directions prior to the 
actual allocation of funds. The Conference is thus seen as the first 
major Federal effort* to develop a coordinated research effort in the ^ 
social sciences, the only comparable efforts being the National Cancer 
^Plan and the National Heart and Lung Institute Plan, which served as 
models for the present Conference. 

As one of the Corfferenpe -panels points out, education in the United 
States is moving ^toward change,' whether we do anything about it or not. 
The outcomes of sound research and development— though enlisting only 
a minute protion of the education dollar— provide the leverage by 
which such change can be afforded coherent direction. ^ 
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In tmplementing these notions for the ^rea of teaching, the Conference 
panels were organized around the major p(?ints in the (Career of a teacher: 
the teacher's recruitment and .selection (one panel), ^training (five 
panels), and utilization (one panel). In addition, a panel was formed 
to examine the role of the* teacher in n^w in^tructiohal^ systems. Finally, 
there were two panels dealing with research methodology' and theory 
V^evelopment. 



educational practi ce 




©te^aching as 
human interaction 



teaching as ^ 
behavior analysis 

teacjiing as 
skill performance 



teaching ^s 
a 1 i ngu i s ti c 



process 



teaching as 
clinical information 
processing 



training & performance 




planning & 
research 



>ef sonnel rol es 
new systems 



^ research ' f;^^ 
^ methodology'* u'/'xa' 




2ory 
development 



yithin its specific problem area, each panel refined its goal state- 
ment, outlined severaJ " approaches " or overall strategies, identified 
potential " programs " "within each approach^ and sketched out, il lustratiye 
projects so far as this was appropriate'and ^eas-ibie. 

• * * 

Since the brunt of this work was done .in concentjrated sessions in 
the^space of a few days, the resulting documents are jnot polished, inter- 
nally consistent, .or exhaustive. They are working papers, .and ,their pub- 
lication is intended to stimulate debate and refinemait.. The full list 
of panel reports is given on the following page. We jexpect serious and 
concerned readers of the reports to have suggestions ;and comments. Such 
comments, or requests for other panel. reports, should be directed to: 

Assistar^t Director . ^ \ ■ - 

Program on Teaching and Curriculdm 
. * National Institute of Education \ 
1200 19th Street, N. W. 1 
Washington, D. C. 20208 ; . . 
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As the organizer and overal.l chairman for 'the Conference, and editor- 
for this series of reports, Pftfessor N. L. G^g^ of Stanford University. " 
richly 'deserves the appreciation of those *?n" the fjeld of teaching research 
and 'devjBlopment. The panel chairpersons, singly and together^, did remark-* 
able jobs with the ambitious charge placed before them: Special acknowl- 
edgments are due- to Philip Winne of Stanfojfd University and to'Arthur 
Young Company, for coordination and arrangements bef-^e^^ during, and 
after the Conference. But in sum to to ^ it is^ tbeu^exp^t panelists—'" " 
each of whom made unique contributions in his"or?h^r respective area-- 
who must be given credit for making the Conference productive up to 
the present^,stage. It is now up to the reader to carry through the^ 
refinement that the panelists have placed in your hands. 

Garry L. McDaniels ^ 
Pt^o^gram.on Teaching Bi^d Curriculum 
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. . INTRODUCTION . ^ 



y statement of Goal • - ' / . ' * • 

o * As expressed^by Panel 9, the goal for research methodology yithin the^ 

^ context of research on teaching is: 

To improve. the validity and utility of measurement, desian, and- 
analysis in research on teaching through the stimulatioffof^new 
^ .methodological knowledge' and through the identification and, trans- 

lation of useful existing knowledge frpm other disciplines/^ » 

. The Panel agreed that although much useful research on teaching has been 
conducted, the value of some of thfi research has been limited because of ' , 
methodological "pr.oblems. In some cases approprfate methodology was not ' 
- . available; in other cases existing best practices were, not followed. 
There^ have also been cases where- methodologies were borrowed from other 
research disciplines' without a careful examination of the assumptions 
involved. . ' - *s 

Thus, the intent of the Panel was- to identify as many as possible of 
the methodological problems which limit the productivity of research on 
teaching. Because of the breadth of the area considered, the time con-, 
straints, and the limited number of panel members, it is likely that • 
important problems are omitted in the present repcyt. Even those, 
probjlems identified are described with varying, degrees of specif icityV* ' 
It is hoped, therefore, that this-document will stimulate productive 
written Criticisms as to the relevance of the problems identified, tfjq ' 

^ adequacy of the descriptions of those problems, and the identffic^tipn of 

• importfint problems ^that were omitted. * 

Issues and* Dimensions of the Panel's Work ? 

One of the jfirst concerns of*the Panel was that of how to identify 
and discuss* potential solutions to methodological problems without the 
context of a specific research project. One suggestion was to* identify 
methodological problems that appear to cut acrossr much of the research 
on teaching. Some of these Were relatively easy to identify from^the 
numerous re^'iews of literature that have been critical of past research. 
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Another suggestion was to use categories of research on teaching 
as the contexts' for'discussion. Some dimensions ilsef.ul for categoriz- 
ing research onteaphing are: * - . . ^ 

Types of variables, e.g., , - . 

• Variables- antecedent -to learning situations 

• ^Variatles which describe. 'the process of the teaching/ 
" 'learning situation - 

• Contextual variables 

. Variables which descrtbe ou'tconies of the learning situation 

Type's; of .learning environments, e.g., " ^« ' ^ 

• One-on^one tutorial * \ ^ 
. Structured classroom - . " * ^ e ' 

. Open classroom 

• Participants to be- measureb, e.g., 

y ' ' Students * * t- 

^" . Teachers ' * 

• Parents . " « ^ - 

The Panel concluded that "generic" methodological issues would serve 
as a starting point, but that all discussions would refer in genefal terrrlS 
to th^ above dimensions. In addition, progress reports from other panels 
and discussions^'with members of other' panels .were used as vehicles for * 
insuring theft the methodological concerns were relevant to the needs of 
research on teaching. ^ * 

General Discussion of Approaches ' , i . ^ 

The Panel agreed upon four general Approaches for achieving its stated 



goal 



Ap()roach 9.1 Develop and test new design and analysis Strategies 
appropriate for research on teachitig^ 



.Approach 9*2 Increase understanding *of existing meas^Jrement strate- 
gies for research on teaching and where appropriate , 
develop new measurement strategies. 

, * Approach '9.3 Identify, demons£rate\ and disseminate methodologies 
from other research disciplines which appear to have 
merit for research on teaching. 

IK 

Approach 9.4 -Consider the utility of standards for improving 
\ methodological practice in research oja, teaching. 

These four Approaches were adopted as a set Believed to encompass all 
the methodological problems of research on teaching and are; therefore, 
necessarijy broad. The first two Approaches emphasize' the need for-new 
methodological development^^ which specifically address the needs of re- 
search on teaching. These twp Approaches consider problems of -design 
and analysis .(Approach 9.1) and measurement (Approach 9.2), respectively. 
Together, they cover the full rapge of new methodological developments. 
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Approach 9.3 is'based on the recognition that existing methodologies de- 
^veloped in other research disciplines may be relevant to research on 
teaching,' but are as yet untried in that context. Finally, Approach 9.4 
is a response to the criticism that some research on teaching has suffered 
from a failure lo use the best existing methodology. It was suggested to 
the Panel that a statement on standards of methodological practice for 
research on teaching wotJld be useful in alleviating this problem. 
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APPROACH 9,1 



, DEVELOP AMD TEST NEW DESIGN AND ANALYSIS STRATEGIES 
APPROPRIATE FOR RES^RCH ON TEACHING 

\ 

\ 

The deV<?lopment of principles for the design and analysis of studies 
has a long history, much of if stimulated 1\>- problems of -research in 
specific fields. For example, during the early and middlje parts of the 
twentieth century, problems of analysis of agricultural data played an 
important role in. the development of techniques commonly used today* 
Generally, there has been less input to th\s literature from education and 
teaching than from agriculture, the biological sciences, etc* Even today, we 
see many new developments coming from areas other than education. For ex- 
ample, analysis of covariance and index of response methodology have come pri- 
marily from agricultural problems. The Panel felt tliat it is time for 
more s ystematic effo rts toward the development of principles for the design 
an^nalysis of studies within the special and possibly unique context of 
problems of education, generally, and the study of teaching, in particular* 

Perhaps the^ major impression ^left by reviews of current research on' 
teaching is that problems af design and analysis are encountered at many 
stages, and are solved--if at all— in an imitative or derivative fashion 
drawing on analogies with earlier studies, especially those in agriculture* 
The current need is to treat seriously the unique problems posed by 
attempts to describe and relate processes of teaching to types of out- 
comes' of teaching,. To do so, serious attention will have to be paid to 
*many problems of measurement (which are considered in a separate approach) 
and to the development of new design and analysis procedures* 

Much is known, especially at the theoretical level, about charac- 
teristics of various design'and analysis procedures. What is missing, 
however, is more detailed knowledge of specific applications to research 
on teaching and of the limitations of the usefulness of the procedures 
within that research context! In general, it is understood that a doctrine 
of specificity applies to problems of design and that this doctrine re- 
quires that designs be d&veloped for particular situations and fnquiries. 
It is true, however, that at least rough categories. of types of appli- 
cations can be developed and used as guides* , 

As might be expected, several of the programs within Approach 9.1 
reflect the ongoing debate about designs and analyses useful for investi- 
gating causal relationships. Clearly the most convincing evidence comes 
from des.igns which include variable manipulations controlled by the ex- 
perimenter". And for much of the research on teaching it was generally 
agreed that arguments for causal relations are strengthened wf\gn random 
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, as5ignn)ent of subjects to levels of an independent variable is 
•vaccomplished. Still, .history suggests that such designs are difficult 
to implement, particularly when the subjects are people. A better 
understanding of how to implement such designs is needed. 

' Nevertheless, researchers will continue in their attempts to 
"tease out" causal relationships from correlational data. When 
cautiously interpreted, the results from correlational designs can be 
useful. However, analytical models that support such efforts are not 
as yet fully understood and for some designs more useful models may be 
developed. ' • 

. Several other programs within, this Approach reflect the Panel's 
concern wjth the interpretation and ger^ral^Ugtion of results. '^For ' 
example, one program was concerned with the proBToruof introducing 
explicitly into both design and analysis the use of pnor and collateral 
information about the context and participants of a study--information 
which can, if successfully used, yield more efficiently designed studies. 
Another program was aimed at the development o?' methods for making 
research on teaching a cumulative enterprise. As Light and Smith (1971) 
have observed, significant knowledne in the social sciences accrues ever 

^ too. slowly. A major reason is that various research studies on a 
particular question tend to be of dissimilar designs, making their re- 
sults difficult to compare. An even more important factor is that 
social science studies frequently produce conflicting results, which 
hinder theoretical developments and confuse those responsible "for the 

^implementation of social policies. At a minimum, what. is needed a're 
(a) criteria for determTning when "darta from IfissTmnaT sTu^^^ 
pooled, and (b) methods for recognizing fundamental differences in 
research design, and avoiding the creation of artificial differences. 

This Approach is intimately related to Approach 9.2, which is aimed 
at increasing the understanding of existing measurement strategies for 
research on teaching and, where appropriate, develoijing new measurement 
strategies. In addition, this Approach receives direction from the 
problem areas of all other panels in the Conference. For example, prob- 
lems of selection (Panel 1) involve estimation of statistical relations; 
problems of conceptualizing and observing teaching (Panels 2-6) involve 
sampling; and problems of theory development (Panel 10) involve con- 
sideration of the roles of data. 

Program 9.1.1: Analysis Problems Related to Hierarchically-Nested Data 

Much of the data in educational research is hierarchically nested 
(Porter, 1973). For example, students are nested within classrooms which 
are, in turn, nested within schools. Such hierarchical nestings give 
rise to a variety of methodological problems. 

Project 9.1 .1.1: Models for Estimating Relations among Variables 
at a Lower Level of Aggregation . Given data on a set of aggregate units, 
what models are useful in the estimation of relations among variables for 
subunits- (Iverson, 1974; Robinson, 1950)? 
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Project 9.1.1.2: Models for Data Aggregation , How should aggrega- 
tion proceed when measurements are taken on several variables for units 
at one level, but the researcher wishes to aggregate both across variables 
and across units to a higher lev^l unit? If one first aggregates across 
variables and then across units, results can (and frequently will) differ 
from those obtained when aggregation across units precedes aggregation 
across variables. Are there contexts and purposes when one order of 
aggregation is more useful than another? 

Project 9.1.1.3: Analysis of Unbalanced Designs . What methods 
are most useful for analyzing data from unbalanced hierarchically-nested 
designs? 

Project 9.1.1.4: Consequences of Violating Assumptions of 
Independence . What dre the consequences for various interval estima- 
tion procedures of violating the assumption of independence because of 
an incorrect choice of the unit of analysis (not appropriately specify- 
ing the aggregate units in the analysis model)? 

Project 9.1.1.S: Analysis of Non-Independent Student Data . Many 
instructiona] situations apply a "treatment" to a class of individual ^ 
students. The classic methods of analyzing an experiment for comparing 
different treatments can be used vn'th the classroom as the unit of analy- 
sis, and the conventional probability statements can be meaningful when 
it is possible t'o assign treatments'at random to classrooms. Although 
the students have not been treated independently, their individual scores 
can contain useful information. What analyses are .possible to utilize 
this information? What models and assumptions would be nacessary to 
permit a valid analysis using individual scores? 



Program 9.1.2: The Utility of and Methods for Conducting "True Experiments " 
in Research on Teaching 

There was a strong consensus within the Panel that to understand the , 
effect of an aspect. of teaching It is necessary to manipulate that aspect. 
This requires an active role on the part of the researcher which might 
best be accomplished by randomly assigning participants to conditions of 
interest (Campbell, 1971). Although variable-manipulation studies are 
frequently {labelled experiments, the word experiment is also used more 
broadly. Because of the importance of variable manipulation to the 
future productivity of researcb on teaching, the Panel recommends clearer 
language in the research literature. Therefore, the Panel recommends the 
adoption of standard terminology which communicates clearly that a study 
has manipulated the variable of major interest through random assignment. 

Project 9.1 .2.1 : Use of Incentives for Participation in "True 
Experiments ." This project^ould examine the use of incentives to en- 
courage participation in variable-manipulation investigations for 
research on teaching. 

Project 9.1.2.2: Ethical Issues in Conducting "True Experiments . " ' 
This project would consider ethical issues where random assjgnment can 
infringe upon^the rights of participants in an experiment: 

1. Denial or temporary deferral of treatment to persons in need 
of it as a consequence of the use of random assignment; 

2. Compromising the participant's right of informed consent to 
participate or not. 

' ' J 
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\ Project 9.1.2.3: "True Experiments" within Quasi -Experiments > 
"^his project would examine alternative procedures for embedding small ' 
randomized studies within large ongoing nonrandomized studies. Campbell 
and Stanley (1963) have considered some possibilities, but more work 
seems to be needed. 

Program 9.1.3: Data Analysis Procedures for Quasi-Experimental or 
Correlational Studies 

Much research on teaching consists of selecting a number of class- 
rooms, testing the students on some criterion variable before and after 
instruction, and relating those scores to the type of instruction. In- 
formation from this research strategy may be useful for understanding 
the instructional process and for suggesting hypotheses for "true ex- 
perimental" research. Two problems, however, are evident: (a) How 
should the data be analyzed; and (b) What is the utility of the results? 

The confusion about methods of analysis stems from at least two 
concerns. First, since pupils have not been assigned to classes at 
random, the posttest scores are usually adjusted for pretest scores - 
on one or more measures. Historically, several methods of adjus^tment 
have been used. One method adjusts on a separate within-class regres- 
sion equation for each class. This method is not as restrictive as some 
in terms of assumptions, but it ignores the collateral information 
available from similar classes. Another method uses a pooled within- 
class regression line for adjustment. A third method ignores the 
individual scores and merely uses mean posttest scores and pretest 
scores across classes. 

Second, other aspects^or data" from "such stud fes are often igrrored. 
Two examples are (a) the possibility that teaching strategies may 
affect the slopes of the within-class regression lines themselves; and 
(b) thu possibility that the performance of a class may be affected by 
the distribution of pretest scores. 

It is evident that different analyses reflect different conceptuali- 
zations ajid models. The confusion over which analysis is "best" ^tems 
from a lack of making explicit the underlying model and relating it to 
the purpose of the study. A full explication of models and analyses 
appropriate for studies of teaching of this type is needecj. 

Finally, it must be realized that quasi-experimental studi'es of this 
type are not as useful for inferring direct cause and effect as "true 
experiments," but they can suggest models which may be useful in under- 
standing the teaching-learning process. Still, the relative utility of 
the two approaches (quasi-experiments and true experiments) is not well 
understood or agreed upon. 



Project 9.1.3.1: Adjusting on Multiple Fallible Covariables . - Re- 
searchers often wish to adjust outcome variables for differences across ^ 
conditions op some set of antecedent var iables. A specific example fs 
found in attempts to assess teaching per f o rma n c e ~i fFt e'rm s" "^o'f^s t u d en t 
outcomes. Such adjustment is of interest because students are typically 
not randomly assigned to teachers. One model for making adjustments 
is to use the structural relations refined on the latent true variables. 
Cochran (1968) has provided useful statements about the relationships 
between least square estimates of regression coefficients and the coef- 
ficients defined on the underlying latent true variables. Econometricians 



nie conference on studies in teaching 



8 



have provided a variety of methods for estimating the structural rela- 
tions given fall-ibly-measured variablea. At least one useful solution 
exists for a single fallible covariable (Porter & Chibucos, 1974). What 
remains to be done is to incorporate the knowledge about estimating 
$tructural relations of multiple fallible covariables to predict one or 
more dependent variables with the subsequent analysis of adjusted outcomes. 

Program 9>K4: Development and Exploration of Formal Models for Incorpora - 
ting Information about the Extent of Implementation of Teaching Strategies 
into the Evaluation of Those Strategies in Terms o f Outcomes 

A critical question in the evaluation of teaching strategies is the 
extent to which the strategies were implemented by the teachers. Clear- 
ly, a strategy can look ineffective simply because it was not used, yet 
this possibility may be. overlooked in an evaluation that concentrates 
on student outcomes. The problem of measuring implementation is addressed 
later in Program 9.2.8. Program 9.1.4 focuses on how to formally incor- 
porate implementation data into the evaluation of strategies in terms of 
outcomes. 

Analytic models that can be used to predict outcomes for a variety 
of levels of implementation are needed. Such models would help research- 
ers unconfound the effect of level of implementation from the effect of 
the strategy given full implementation. For example, if Strategy A at 
the observed level of implementation has better outcomes than Strategy 
B across all levels of implementation, the conclusions are clear. If 
a more fully implemented Strategy B might exceed Strategy A in outcomes, 
however, then the researcher might want to concentrate on methods for 
improving the implementation of Strategy B. 

'Pr6qrarir9':T.5: TnresTigTttffn fff the^Utrlj^^ Longitud-ina-l (Time-Sep'es) 
Designs for Various Types of Research on Teaching and Concomitant Analytic 
' Problems 

Longitudinal data-collection efforts are sometimes held as a panacea 
for research on teaching. Although this is not likely to be the case, 
the question remains: For what research questions are longitudinal designs 
necessary? In addition, a variety of problems with the analysis of longi- 
tudinal designs appear to require further work, e.g., changing metrics 
of the dependent variables over time, unevenly spaced time points, and 
methods for collapsing data across time points. Glass (1972) and 
Anderson (1971) have done recent work on some related problems. 

Program 9.1.6: Empirical Selection of Models of the Teacher-Student 
Interaction Process 

If researchers have difficulty specifying an underlying model of 
the teacher-student interaction^ process, what sorts of statistical pro- 
cedures can be used to choose among several competing models? More speci- 
fically, what are some useful alternatives to the least squares criterion? 

Program 9.1.7: Procedures for Combining the Results of Related Studies 
over Time 

How can studies of teaching formally build into future designs, the 
results of earlier studies so that future studies can be more effective 
or powerful? 
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Program 9J.8: Procedures for Studies of Teacher Effectiveness 

Prior to consideration of the design and analysis procedures for 
research on teacher effectiveness, a caveat is necessary. Several mem- 
bers of the Panel questioned the utility of such research even given 
satisfactory methodology. The reasoning was that , numerous past efforts 
have not been productive. Many teacher characteristics have been shown 
fo be unstable over time and thos^ that are stable appear to be unrelated 
to student outcomes. In addition, studies which simply attempt to estab- 
lish that teachers do have consistent effect? over time do little to help 
understand the causes* of those effects (Rosenshine, 1970; Brophy, 1973; 
Acland, 1974). The Panel concluded, however, that improved methodology 
would' rule out one rival explanation for the lack of utility of such 
studies and might, therefore, be of value. 

Given the above, the ideal strategy for studying the general question, 
"Do teachers make a difference?" requires something close to the following 
design. First, a large number of students wou^ld be randomized over 
teachers. Then, class means and variances on some index of change, e.g., . 
posttest scores or gain scores, would be compared across the N teachers. 
This could be done over several years to determine whether there are 
any consistent outliers. The existence of one or more outliers would 
imply some structural difference in teacher effectiveness. 

The analysis would be performed to examine each teacher's change 
scores over the several years. Any teacher whose change scores consis- 
tently fell a^ove the average for all teachers would represent a positive 
teacher effect. This is similarly true for negative\effects. A hodge- 
podge of above and below the mean results for most teachers would 
indicate a lack of teacher differences. The intraclass correlation could 
be used to detect consistent differences in teacher performance (Veldman 
-&-Brophyr-1974-)-. ' ^ 



Project 9.1.8.1: Problems Due to Lack of Random Assignment . Sup- 
pose that because of political or administrative realities, large-scale 
random assignment of children across many teachers for several years Is 
not possible. The question then emerges: How can the broad program goal 
of searching for consistent teacher effects be examined? This goal creates 
a need for some kind of sensible "adjustment" to determine the change 
scores achieved by each teacher In each year. 

How should these adjustments be made? The answer Is not obvious. 
For example, one possibility would be to run a grand regression equation 
using all the pretest-posttest scores for any given year. Then, for 
each teacher, a residual (the observed minus predicted) final score could 
be obtained. But this Involves Implicit assumptions about learning curves. 
What precisely are these assumptions and are they reasonable? This 
question Is similar to that posed In Program 9.1.3. 

Now, suppose this method of obtaining residuals was applied over 
several years, flttiJig a new grand regression equation each year, and 
computing each of the N teachers' residuals from the new regression each 
year. This process would lead to a set of M residuals for each teacher 
over M years. Once again, the Intraclass correlation would be useful to 
determine whether consistent differences in teacher performance can be 
detected. A high, positive correlation Implies strong, consistent dif- 
ferences In teacher effects. 
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Project 9.1.8.2: Folfowinq Students over Time . The description of 
the program for searching for teacher effects has so far considered each 
year's change score within each .of the N classroofns as an independent 
entity. This approach ignores an important question: If a group of 
students for one teacher has a mean chang'e score in Year 1 that far 
exceeds the mean change score over all H teachers, what happens to those 
students in the following years? Do they maintain their lead? Do they 
increase it? Or does that lead dissipate? Answering this'question \ 
would require following students over time. One design would be to 
keep each group of students in any class in Year 1 together for several 
subsequent years. This would tend tO' preserve any contextual effects 
of students interacting positively with one another. A second strategy 
would be to break up the classes from one year to the next. If this 
breaking up were done randomly, new. information could be developed about 
other teachers in future years. Several questions must be dealt with 
here, and a thoughtful consideration of the implications of alternative 
designs would be useful. 

Project 9.1.8.3: Procedures for Combining Several Intraclass 
Correlations into a Single Estimate . Assume the earlier projects in this 
program have been completed; i.e., assume we have available an intraclass 
correlation coefficient based on M years of data from N teachers. Then, 
the value of this coeffirierit will give information ahout differential 
teacher effects. But the correlation coefficient would be coming from a 
single study, for example, in a single city. Imagine that, because of 
interest in getting good multisite (multicity or' multischool ) data, a 
similar study is conducted in each of R cities. This gives us R intra- 
class correlation coefficients that may well be based on different 
sample sizes. What is the most effective way of combining the set of 
R intraclass correlation coefficients into an overall estimate (Vt)taw, 
1948; Olkin, 1965)? 

Iher&_are^ajt J_east_^Ju'^^^ alternat ive ways of combining the data 

from the R studies. First, tTie raw dafa~c"o~urrd"ire" pooled^ Second, -cr~~ 

median of all the intraclass correlations could be computed. Third, 
Fisher's Z transformation, which is simply a function of the correla- 
tions and their associated sample sizes, could be used. Is one pro- 
cedure always preferable to'^the others? 

Probably, each procedure has a setting in which it is most effec- 
tive. A reasonable guess is that the most effective procedure depends 
upon an assumption about the fom of the population of correlation 
coefficients that arise from dif^'erent sites^. For example, if one as- 
sumes that all sites have a true underlying coefficient and that this 
coefficient is an identical parameter over all R sites, one^method may 
be best. A second circumstance involves assuming soi.ie distribution of 
true coefficients over the R sites. Then, thb best way of combining the 
R observed coefficients may well depend upon the distribution of true 
coefficient's. If so, what procedures are useful for describing the 
distribution? A final case would be that in which researchers develop 
a series of^ R estimated coefficients and we have a modest prior proba- 
bility that several of them are outliers. In this event, depending upon 
our prior estimate of both the probability of an outlier and also its 
estimated magnitude, we would pr»obably want to weigh outlying observa- 
tions less than coefficients clustering around a measure of central 
tendency. 
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Program 9.1.9: A National Study of Current Educational Practice Analyzed 
at the Behavior-Setting or Organizatiorr-of-Instruction Level 

Rissearch on, teaching has been conducted in a wide variety of set- 
tings and types of classrooms and schools. An important question about 
most research on teaching is the extent to which the conditions studied . 
are representative of a larger population. In reading and conducting 
research on teaching and instruction, therefore, it would be useful to 
have knowledge of the distribution of basic types of educational prac- 
tipe. One could be interested in such an issue at many different levels: 
district organization, subject matter coverage, etc. In this program, 
however, interest centers on the organization of classroom settings j 
^.e., the instructional organization (Gump> 1967J. 

Many researchers have intuitive hunches about the distribution of 
instructional organization, that is, about how typical or atypical a 
particular situation is. But data which speak directly and systemati- 
cally to this descriptive end are not available. For "exainple,:jTow — 
frequently does ^recitation as an instructional setting occur in high 
schools? In elementary-schools? How often does free choice of activity 
or individual work occur in high school's? In elementary schools? 

Knowledge about instructional organization is important because it 
relates to the behavioral options of teachers and students; behavior 
setting structure has been shown to be systematically related- to 
^philosophical curricular differences (Grannis, 1^73). If one had knowl- 
edge of the instructional organizations of a representative sample of 
schools, generalizations about leaching procedures could be, more syste- 
matically related to other factors, for instance, to interpreting evalua- 
tion outcomes. 

In addition to its immediate purposes, sach a- study would facili- 
tate systematic sampling plans, policy decisi'Ons, and historical research, 
-RarijcujAdX-Jf it could be e fficiently co llected periodically. Such a 
survey would provide a convenient way of getting evToence aboljl e'daca- 
tional innovation and change. 



Project 9..1.9.1: Development of Behavior-Settinq Types . There is 
need td develop an inclusive set of behaviof-setting types or instruc- 
tional types for use in future studies. These typologies could be based 
on a small empirical study of classrooms, and revjews of literature and 
concepts (Gump, 1967; Grannis, 1973). 

Project 9.1.9.2: Economical Ways of Acquiring Information on 
Behavior Settings^ In the past, behavior settings have been studied by 
direct observation, which is costly. There is a need to compare the 
validity and reliability of behavior-setting (type of instructional or- 
ganization) information obtained via teacher questionnaire and direct 
observation of classrooms. The aim would be to develop and validate an 
economical means of obtaining reliable and valid information for a 
national study. 

Project 9.1.9.3: National Survey of Classroom Behavior Settings . 
The objective of this project would be^ to conduct a national study of 
classrooms at various educational levels to ascertain the distribution 
of various instructional organizations within and across schools, 
districts, etc. The survey would utilize the strategies developed in 
Projects 9.1.9J and 9,1.9.2. Grade levels, subject matter, etc., should 
be included as relevant information^ 
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APPROACH 3.2 

INCREASE UNDERSTANDING OF EXISTING MEASUREMENT STRATEGIES 
FOR RESEARCH ON" TEACHIl^G AND, WHERE^ APPROPRIATE, DEVELOP 
• NEW MEASUREMENT STRATEGIES 



There is a long and productive history of psychometrics, whicn has 
supplied theory and guided test construction for research ori teaching. 
11uclrar~thT^htstory, however-,— re^ measuring the 

aptitudes and achievement of individual students. AlthougfT^fviTVoTkr 
has been, and will continue to ie, of value^to research on teaching, 
other aspects of measurement appear to neecT greater attention. For 
example, better measures of so-called noncognitive outcomes of teach- 
ing, including personcLllty. characterise self-perceptions, values, 
and attitudes are required. There is "a need for better theories about 
such constructs, but development of measures Is^also constrained by the 
need^ for better methodology. A'second example is the need for better 
measures of the teaching process, particularly in jiatural settings. 
A third example is the need for group assessment measures a^' contrasted 
with measures designed to assess individual differences. 



Current and pepding legislation has given a<sense of urgency to 
the need for assessing effects of teaching. Thirty-one states are now ^ 
considering laws requiring all applicants for a teaching license to 

-,demQn.s_tr ate their te acMnq effectiveness. One example i& the Stull 

Act, effective in 1972 in CFIlTonvTaT^WhTchTrequires-a-H^^ 

tricts to evaluate their teachers. Many of these evaluations will be 
based on student outcomes,, yet existing measures of student outcomes 
are largely restricted to cognitive achievement and aptitudes. Even 
these measures may not be appropriate since n\ost were designed to dis- 

. tingui\sh among individuals (students), not groups (classrooms). , 

The programs within this Appv'8ach can be roughl^ categorized as 
dealing with concerns for measuring dimensions of the process of teaching 
or of the outcomes of' teaching. There are several motivations for 
measuring dimensions of the process of teaching. First, knowledge about 
what actually takes place in a learning situation is useful in stimulat- 
ing new theories about teacTiing strategies. Second, much research is 
devoted to providing teachers with new strategies believed to facili- 
tate student learning. If student outcomes do not reflect the attempts 
to change teaching strategies, then"^ there are at least two explaPiations.' 
One is that the strategies were not effective, and the second is*that 



nie conference on studies in teaching 



the strategies were not implementecl by the teachers. Better measures 
of the teaching process are necessary to narrow the alternative 
explanations^ . ^ ' ^ ; 

With respect to measurement of outcomes, there is a need to* develop 
or select measurements which are valid for assessing Ihe effectiveness 
of an intervention. This need stems from the inappropriateness of many^ 
current and widely used standardized achievement tests. These measures 
are inappropriate for assessing teacher (and curriculjim) effects for a 
' variety of reasons: { 

* 1 * 

1.^ They were not designed to measure the outcomes of inter- • . 
venMons. 

, 2. They tend to measure relatively stable characteristics. 

. 3. Functionally, the (najor pyrpose of these tests is the 
sorting and selection of individual ^tudents^. 

Recent efforts have been at least partially respjOnsTve~To'^e 
abav^ouJLUnedjneasi^^ First, numerous classroom obser- 

vation instruments have^beerndeVeloped-ta-measure. thes teaching process, 
and some useful data banks describing classroom activities are now 



available, e.g., the SRI Follow. Through' classroom observations. Never- 
theless, the pripperties of existing observation schedjjles a're generally, 
not well understood, and* problems of validity and reliability remain. * 
becondT^the^recent surge-^in.^^the_devLeJopme_nt and jjse of criterion- 
referenced measures should alleviate some of~~fhT^ ajnc^rns^bout ^x4^t-— 
ing achievement measuf'es. Still, most of the work islconcentrated on 
assessing individual student performance, while one of the major needs 
for research on leaching is to assess the impact of ir^terventions. 

As stated previously, this Approach is related to Approach 9.1 . ' ^ 
to develop and test new design and analysis strategies. Clearly, 
the reliability and^ validity of measures can limit the utility of a 
research study. Design and analysis strategies must be sensitive to 
the weaknesses of the measures, but they cannot turn Useless data into 
useful data. There is some reason to believe from recent literature 
that concerns for solutions to design and analysis problems have over- 
shadowed concerns for solutions to problems of measurement. If so, 
this imbalance should be corrected, ' j 

Program 9.2.1; Educational Significance of an "gVfect " 

Historically, the Issue of what an instrument measures has been ap- 
proached from two points of view: (a) the content ofj the instrument 
(face or content validity) and (b) the interrelationships between the 
instrument and other variables (predictive, concurrenit, or corvstrDct ' 
vali^iity). Typically, these points of view have not been used differently 
for various types of measuring instruments, e,g., fori norm vs. criterion- 
referenced tests or multidimensional' v^s* single-trait tests. Thpugh it 
is not clear whether such a differentiation should" be; made, it seems 
reasonable to think about the conditions under which the two approaches 
Sre most useful. 
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The two validity. approaches are similar in that bottv beg the issue 
of causaljty. They differ, however, in that correlation studies of the 
interrelationships. between the instrument and other measured behaviors 
depend upon existing distributions of scores on al 1 variables considered^ 
This last point involves the distinqtion" between an "effect" defined as 
a difference between two points on a natura.1 scale and a standaraized 
measure of the effect. Standard deviation units, correlations, and per- 
centages of variance explained are exampl^gv of standardized metrics 
which have been used to define the educational significance of an "effect." 

The first purpose of this program is to suggest strategies for assign- 
ing meaning to measurement-^strategies which are independent of the 
original distribution of the measurements (Porter & McDaniels, 1974). A 
secondary purpose is to attempt to give meaning to the "impact" of an 
intervention through the mechanism of giving meaning to the particular 
measures used tp assess the outcomes of the intervention.^ -In a sense 
then, the function of ttiis program is to move the field from defining 
"educational significance" of an effect as, say; a one-half standard 
deviation difference between an "experimental" and a "control" group, the 
standard used in the Westinghouse-Ohio evaluation of Head Start (1969). 
It is also intended to move the field away from defining '^diJcational 
significance" as a statistically "significant" difference. Instead, it 
lis intended that the field begin defining the "educational significance" 
of an effect in terms of either the measured consequences of the size 
.of_the effect for that instrument or the content validity of the instrument 
and fRe^chosen-^rJierionJI evel • 

0 projects are suggested. The fTrst'i'S- to- explore the possibility'* 
of d ning the meaning of the size, of difference between two points orv a 
na^ I scale empirically by estimating the impact of a change from one 
^0 4o-^other on a broad range of other possible concurrent and future 
outcunes. ThTFTtTategy'-wiJl be labelled as indirect validation. The 
second project involves the determination of the meaning of particular 
criteri&n levels on instruments jind is designed to provide direct, under- 
standing of a phenomenon through^ content validity. 



Project 9.2.1.1: Indirect Validation . The "size of an effect" is. 
defined in terms of the raw score difference between two points on a scale. 
For example, this'might translate into the difference between means. What 
1s called for is to give meaning to effects of different sizes by relating 
those effects to other measured aspects of a person's behavior or experience. 
Thus, howtioes a ten-point, difference in Binet IQ scores relate to dif- 
ferences^in one's chances of attending a college or one's being assigned 
to a special remedial class or one's future Income? Here, meaning would be 
given to the size of effect through its relationships with other outcomes. 
In the context of no Intervention, tWs would transVate Into giving emplp'cal 
/ meaning to a particular distance on a particular measuring instrument i 
(difference in, scores on IQ tests take meaning from predicted differences 
on other outcomes). As a §tart, a limited set of widely used instruments 
might be studied, e.g., the Stanford-Binet Intelligence Scale and the 
Metropolitan Achievement Tests. Existing data could be used to attempt to 
give n^eaning to the instruments and new data could be suggested where 
necessary. 
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The following issues should be considered in carrying^ out the project: 

'1. Would use of a standardized measure of effect -(such as explained 
variance, correlation, or standard deviation units) yield the 
same kinds of conclusions as the indirect validation approach? 
Under what conditions do these two approaches lead to different ' 
conclirsions? ^ <► 

<K ' ' 

\r*; 2. In the context of giving meaning to the siz§ of an effect of an 

' intervention, -does a single score distance translate into dif- 

ferent sizes of effect for different contexts and populations? 

- - 3. Consider the same problem as 2 for giving meaning to a raw score- 

^ distance whfere difference irt. size of effect may not be 

attributed to a particular intervention. ^* 

4. Consider the problem that "effects" of the same size at differ§nt 
points orv,a scale may have to be assigned different meanings de- 

^ pending on the context and population. For example, in Boston, thie 

* cut-off for assignment of students to special classes is an IQ 

score of 80. In this situation, an intervention which results in 
a two-point chajige in IQ scores has different meaning if the 
change is from 79 to 81 than if the change is from 104-106., 

5. Consider the possibility that two interventions, >each raising IQ 
scores by 10 points (say^ from 100 to 110) on a short-term out- 
come, measure, may have very different meanings if the two increases 
in scores are accompanied by, changes In different characteristics 
arid, therefore, by different impacts on other outcomes. 

Project 9.2.1.2: Direct Validation .* This project would describe existing 
measuring instruments and particular-criterion levels in terms of a ' 
theoretically-based understanding of the content of the measuring instruments. 
The intent of the project is to give meaning to aa instrument by describing 
what the instrument requires of the respondent in terms of knowledge or 
skills. Thus, the test and criterion level would be used in a theoretical 
framework to give direct meaning to reaching or failing to reach criterion 
on the instrument. 

The following issues should-be considered in carrying out'the project: 



1. 



Consider the logic of the test as well as other character- 
istics. For example, in reading it would be useful to dif- 
•=fgrentiate among the followina: (a) labored decoding skill, 
(b) fluent decoding skill, (c) understanding of the logic, syn- 
tax, and internal structure of discourse, and (d) extent 
to wliich the respondent shares the concepts and purposes of 
the test^constructor. 

In the context of interventions, consider the possibility of 
using this direct validation strategy to assess interventions 
without reference to comparison groups. 

For a given instrument, consider the meaning of different 
criterion levels for (a) a single context and population/ 
and (B) across different contexts and populations. Use 
existing data where possible and suggest new data where 
necessary. ' <^ 
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Program 9.2.2: Analysis of the desirable Properties of Tests Stratified 

by the Purposes of the Tests . * ' . * 

What are the desirable properties, of tests serving different purposes*? 
Some examples of tests having different purposes are: 

Mastery tests> These lead to dichotomous decisions (Harris^- 
Alkin, &,Kopham, 1.974). . ' . v 

Diagnostic tests -~ Should they be multidimensional measures 
of skills plus measures of other. characteristics that influ-^ 
I ence those-akills? . ^ . 

' Measures of outcomes Should* they sample, the common core of 
objectives or sample the multitude of differential objectives? 

Program 9.2.3: Construction of Tests with Face Validity ' \ 

What test construction strategies are most useful in developing 
measures that have the face validity required by the courts? Given cur- 
rent emphasis on accountability, this concern seems particularly important 
(Klein, 1971). 



Project 9.2.3.U Development of New Measures That 'Are. Jie9 to t he Purposes 
of Instruction . For the study of teaching, what is the role of special- 
ized tests designed to be sensitive to different teaching strategies? All 
too often researchers use general standardized achievaijient tests that were 
designed for purposes other th^n differentiating among ^teaching strategies , 
and which, therefore, cannot be expected to be sensitive t& that end. 

Project 9.2.3.2: Development of Measures Dealing with Non-Cognitive 
Outcomes . In addition to achievement measures, there is need for the - 
development of measures of important non-cognitive outcomes. The construc- 
tion of these measures should be tied to wfell-developed theories (Walker, 
1974). , 

^ Project §.2.3.3: Development of Measures for Observations of Classrooni 
Process Variables . The developjuent of measures to assess classroom pro- 
cess variables such as -time spent oh a task holds promise for research 
on teaching. The rate of progress in this area 6f research over the 
past few years suggests a need for a totally new approach. Perhaps , 
greater concern for the relation of process to-outcomes would be useful.' 

Program 9.2.4: Analysis of Crossed Design Achievement Tes*ts ^ 

Achievement tests may consist of a set of items that e^ist in a com- 
pletely cir*ossed design. The dimensions of such a design might be types 
of content and types of tasks. The complete set of items then exists in . 
a two-way de&ign with one item per cell; An example of a crossed de- 
sign achievement test and an item analysis appears, in Harris ahd Harris 
(1973). ' , ^ 

The problem of how to score and analyze such items appears to be 
one that deserves tonsiderable study. Unidimensional latent trait models 
are probably inappr^opriate. ' What other models need to' be developed? 
To what extent are multi-mode analyses appropriate? . . ^ ^ 
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Program 9.2.5: Test Bias - 



Questions of test bias may be relevant to several aspects of -research 
on teaching. The possibility of .test bias ^differential pr.edictive . ^ 
validity across s.ubgroups) maV/be-^anJmpQriani_consjder^tion in design-' 
ing*- systems for the prediction of teacher effectivfenes~s7^a"~ooncern of 
Papel 1. ^Similarly, test bias'js^an important consideration when achieve- 
ment measures '"(mastery, diagnostjc,. etc. ) are used in tracking students. 
• • .V ^ 0 . - ' - 

What are the implications^ of 'various definitions of tast bias for. 
differential treatment of students, and teachers? -Several studies (Linn 
& Wertz,, n971; Schmidt & Hunter, 1974) address thjs question for the ^ 
Cleary (19j58) definition of test bias, but sintflar work is required for 
other definitions, ^uch as those of .thorndike'*(l 971 ) and Cole (1^73). } 

AVi adclitional concert is that almost all test bias studies have been 
conducted using criteria that may be presumed to be biased to the same 
degree as the predictors: €an other criteria be developed that. are less " 
subject to the same biases? For example* previous research shows that 
verbal tests predicted success in gunnery classes when the jcriterion was 
grades received from the^ class but*not when the criterion was, performance 
measurement. the college level, test bias studies have>used early, 
i.e., freshmen, performance exclusively. Evidence, although it is not ^ * 
very systematic, is accumulating, that syggekts that if later perfoVmance . 
were used as the criterion,, differ^n.t results wpuld be obtained. 

Finally, almost all test bias -studies have been conducted at .the 
higher educational levels^ There is a great need for thi$ type of re^ 
search at the lower educational levels. Appropriate criteria, however, 
must be .developed for this research. ' (See Project 9.3.11:2 for an 
additional aspect of test bias.) ' 

.Program 9.2.6: Evaluation of Profiles . ' 

Comparative studies* of tj^aching- methods encounter technical problems 
in the evaluation of profiles of ou^tcomes. Technical characteristics of. 
the measures must be considered in the development of any composite 
indices of, outcomes" (Harris, 1955). In addition, problems of weighting 
"the importance of a variable a priori must take into account-the dif- 
fering metrics'pf thje variables. 

Program 9.2.7: Defining Desired Teacher Performance : v> 

When defining teacher""performance for purposes of accountability, 
is there any agreement among significant groupi such as parents, teach- 
ers, students, and legislators? What kinds of consistencies can be 
found in the objectives underlying existing statewide teacher assess- 
ment programs? (For work on a related issue, see Hoepfner, Brcfcfley & 
Doherty, 1974.),, , ^ ^ ■ 

Program 9.2.8:' Development of Measurement and Observational Procedures 
for Describing the Degrees and Types of Implementation of the Components 
of Vari'ous Teaching Processes and Programs * 

The problem'^of estimating and describing the degree of implementation 
of programs and the components of program^ is a critical one for research 
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on teaching. The need for data related to the implementation 
issue is most salient in two contexts. First, when a program I 
developer, teacher trainer, supervisor, etc., is attempting to train^' 
persons to carry out a particular program or type of instruction, he 
needs to know the extent to which the implementation is occurring. He 
needs also to be "able to analyze the ways in which the program is and is 
not being implemented. ^ In this conte:H:t, such information might primarily 
serve a feedback function. Second, in evaluation studies or any study 
tying program (treatment) to student outcomes, information on tjie degree 
and type of implementation is essential. For example, in the-^nalysts 
of presumed replications of a given curriculum it is essential to know 
how comparable the classroom procedures (treatments) really were. In 
evaluation studies of a comparative nature (as argued in Program 
9^1.4), implementfftiorv data are even more essential for inte*'*pretation 
and analysis of <;ffect^ (Stodolsky, 1972; ^issell, 1971). 



Project 9*2.8.1: Measuring Implementation . While the methodologies 
and measuremervts needed for implementation research may be somewhat 
program specific, the following general approach might be useful: 

. Explore" the means for collaboration between curriculum 
. developers and methodologists in order to develop opera- 
tional ized descriptions of essential components of . 
curriculum. 

. Specify tolerance levels for acceptable or unacceptable 
. * levels of fmplemen^tation. 

•% ^ 

. Explore means* for identifying nonesseijtial or unintended 
components of a curriculum. 

. Relpeat the abV/e steps for a few diverse curricula. 

In carrying out this approach or a similar approach, the following 
types of* issues should be considered: • 

1. For what components of programs or ^types of programs can 
implementation be ai^sessed without direct obsei vdtion or 
with minimal observation? 

2. For what components of a program or types of programs is / 
direct observation essential for estimating implementation? 
(See also Project 9.1 .9.2.) , 

• ■ \ _ " ^ t* 

3. How much data are necessary;? ijow much amf-h'ow frequently 
should monitoring be done? (The^ answer will probably vary 
for different classes of programs.) . i 

Project 9.2.8.2: Stability of Student and Teacher Behaviors . lx\ 
dealing with the issue of how much data are necessary for implementation 
studies, an important related issue is the accun^ulation of knowledge about 
the stability of student and teacher behavior in general. It would be 
helpful to have a better empirical basis for estimating the stability of 
behavior and, therefore, for obtaining guidance as to the frequency and 
extent of data collection. In addition, empirical data on such matters 
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would facilitate interpretation of data on classroom phenomena. Thus 
serious attention should be given to the questions:. How inherently stable 
are student and teacher behaviors in classroom setting? Under what con- 
ditions are the behaviors relatively stable and relatively unstable? 

approach to this question might take the form of studies iri 
w^ich teachers and students are observed intensively over a period of 
time (say a month). If sufficient. data were available, various esti- 
mates of stability could be made regarding behaviors of different types 
Bnd their relations to subject matter, setting, etc. For an example of 
such data-see Karlson (1972). * < 

Program 9.2.9: Studies to Improve the Reliability of Observational 
Procedures I " 

Even when the stability of the phenomena being studied is known, 
the reliability of observational procedures can be problematic. When 
using on-the-spot category systems, the major concern is field reliability, 
i.e., observer agreement. In this connection, studies to explore 
the effective training of observers deserve support. While^there is 
some accumulated wisdom on this subject (Gellert, 1955; Weick, 1968), 
empirical studies comparing the utility of certain alternative procedures 
for training should be carried out. 

Jn observational studies which use open-ended procedures, 
e.g., narrative records., there are two types of reliability: (a) field 
reliability, i.e., agreement of observers in the field; and (bj coding 
reliability, i.e., reliability of applying coding categories to narra- 
tives. These two types of reliability are interdependent.- In particu- 
lar, field reliability cannot be assessed without coding. Exploration 
of methods for assessing the two types of reliability as well as their 
interdependence should be .supported. Finally, in the case of closed 
systems, alternative training procedures for field observers should be 
studied. 

r 

More generally, certain technical studies of the utility of vari- 
ous approaches to recording data, should be launched. For example, 
under what conditions does videotape or audiotape recording improve the 
precjsion of observations? What are the costs and benefits of various 
procedures for recording data?^ 

Program 9.2.10: Psychometric Proper t ies of Criterion Referenced Tests 
and Concomitant Test Construction Strategies 

Although tho need for criterion referenced tests is apparent, the 
methodology for developing them is lagging badly behind the aspirations 
of potential users. Much of classical test theory does not apply. New 
models need to be developed to deal with such problems as the fidelity 
of measures to the performances represented, the stability and general- 
izability of the measures, and the probability of misclassification 
under various conditions. As theory develops it must be translated into 
test construction strategies. 
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APPROACH 9,3 

IDENTIFY, DEMONSTRATE, AND DISSEMINATE 

METHODOLOGIES FROM OTHER RESEARCH 
DISCIPLINES WHICH APPEAR TO HAVE MERIT 
FOR RESEARCH ON TEACHING 



The first two Approaches reflected a concern for the development of 
new design, analysis, and measurement techniques that serve the unique 
needs of research on teaching. Most of the methodology currently used 
in research on teaching, however, was originally developed in other re- 
search disciplines. There are at least two reasons whj/; continued identi- 
fication, translation, and disseminatibn of methodologies from other 
research disciplines seems warranted. First, in many cases, these bor- 
rowed methodologies have served research on teaching well. Second, where 
existing useful methodologies are available, duplication of development 
should be avoided. 

Panel members observed that historically there has been a time lag 
between the development of methodological and analytic strategies in one 
discipline, and the use of those strategies in another discipline. Dur- 
ing the preserit period of rapid developmeht of methodologies across a ' 
variety of research disciplines, it is becoming increasingly difficult 
for workers in research on teaching to stay abreast of what is available. 
At a minimum. Approach 9.3 calls for an awareness of methodological de- 
velopments in econometrics, sociology^ psychology, anthropology, as well 
as applied and mathemaj:ical statistics. These methodological develop- 
ments need to be screened for their potential utility fn research on 
teaching, and the more promising methodologies should be. tried out. As 
a start, the Panel attempted to identify (in the form of programs) a few' 
methodologies that at lesst on the surface appeared to have utility for 
research on teaching. 
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This Approach is intimately related to the fourth Approach, which 
calls for considering the utility of standards for improving methodologi- 
cal practice in research on teaching. Both Approaches differ from the 
"first two in that they are designed to improve research on teaching 
through the use of existing methodologies rather than through the development 
of r|ew methodologies. The difference between the third and fourth Ap- 
proach is that the third Approach attempts to capitalize on methodologies 
virtually unknown to the community of researchers on teaching/while the 
fourth Approach is concerned with increasing the level of methodological 
awareness within that research community. 

Program 9.3.1: Optimal Designs for Research on Teaching 

Evidence on a particular research problem or question usually can 
be collected in several ways. Unfortunately, the choice among designs 
is often made on the basis of whit other investigators have done, ir- 
respective of whether their choice was optimal or whether the setting 
of the earlier study was similar to that of the present one. A good 
design should, however, maximize the probability of obtaining useful 
results. Although the term useful must be defined by each investigator, 
the definition should cQrisider a variety of factors. For example, 
choosing a design solely on the basis that it has sufficient power to 
reject a false null hypothesis may be too restrictive. Clearly, the 
choice of design must be made within the constraints imposed by factors 
such as financial and administrative feasibility. 

Existing textbooks on statistical design provide only broad state- 
ments about the utility of alternative designs and little or no guidance 
as to their application in real-world research settings such as schools 
and classrooms. The Bayesian approach, however, has the potential for 
combining^ relevant factors into a model which allows the researcher to 
select a design in a rational and clearly-defined way (Raiffii & Schlaifer, 
1961). Technically, this process i's called pre-posterior analysis. 
Given prior experience, alternative designs and their probable results 
are analyzed relative to the utility of those results, and the design 
having the maximum utility is chosen. Another advantage of pre-posterior 
analysis is that it focuses attention on the important factors in choos- 
ing a design. The model facilitates the identification of critical points 
where precise information is necessary and, hence, 'wh6re research efforts 
should be directed. 

While some theoretical methods for pre-posterior analysis are avail- 
able, few practical methods have been developed. What is needed are ways . 
to, make the methodology accessible to the performer of research on teach- 
ing, with his perhaps unique knowledge and experience. One way to 
achieve this goal is through the production of computer programs which 
interrogate the researcher at critical points and present not only the 
optimal design, but also an analysis of the relative importance of each 
critical point to the f-inal choice of design. 
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Program 9 .3.2: * Problems in Developing Measurement Procedures to De- 
scribe Various Teaching Processes or Programs (Including Behaviors of 
Teachers and Student"?! - 

The following is a collection of partially related questions about 
problems in developing measurement procedures to descrjbe the teaching- 
learning process, 

. To what extent and under what conditions is the notion of "sequence" 
useful in describing processes? 

% — ^ 

• How and under what circumstances can the more complex time-series 
analyses be applied to the description of teaching processes? 

• How and under what circumstances can signal detection or quantal 
response theory be applied to the description of teaching processes? 

• How and^under what circumstances can Markov processes be applied to 
description of teaching processes? 

. To what extent can present multi-dimensional scaling procedures, 
both metric and non-metric, be employed for meaningful reduction 
of extensive collections of data describing teachers and students.? 

Program 9.3.3: Evolutionary Operation 

In what way, if any, is the concept of evolutionary operation (Box 
& Draper, 1968) useful for investigations--of^ the teaching process? 

Program 9.3.4: Organizational Development Methodology for Use in Form- 
ative Research on Teaching Strategies 

Over the past ten years- organizational development, as a field of 
^ inquiry into the analysis of the adequate functioning of groups, has 
developed a systematic methodology which, at present, is primarily used 
in industry and government. Work like that of March (1965) and Argyris 
(1971) may offer considerable insight into attempts to carry out ade- 
quate formative research on teaching strategies. 

Program 9.3.5: Computer Simulation 

The computer simulation of human behavior carried out by political 
scientists such as Newell and Simon (1961, 1968) and by psychologists 
such as Abelson (1963) might yield insights useful in research on teach- 
ing. Such insights may result in providing more resources for dynamic 
modeling of the teaching process. 

Program 9.3.6: Path Analysis and Other Models for Estimating Causal 
Relationships 

The objective of this program would be to consider the variety of 
techniques used to estimate causal relationships by'people in political 
science and sociology and determine their applicability to research on 
teaching. The simplest of the approaches is "path analysis"— an approach 
which has already been disseminated somewhat, at least in its most 
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primitive form (Werts & Linn, 1970; Duncan, 1966). A serious discussion 
of the application and misapplication of path analysis in research on 
teaching would be useful. In addition, research techniques from Blalock 
(1964, 1971) and others, usin^ partial correlational analyses and certain 
types of multi-stage least sq'uares analyses given assumptions about causal 
ordering, might be useful to research on teaching. 

Program 9.3.7: Scaling Methods from Consumer Research 

The objective of this program .would be ta consider the variety of 
scaling methods developed in consumer research for possible aoplication to 
research on teaching (Crespi, 1961; Green, 1970; Gallup, 1972). 

Program 9.3.8: Generalizing from Non-Random Samples 

When data are collected on a non-random sample of teachers and 
students, is the possibility of valid irvference to the complete population 
eliminated? Recently, techniques have been developed for estimating re- 
lationships among variables even when marginal distributions have been 
biased (Goodman, 1972, 1973). When are such procedures appropriate for 
research on teaching? 

Prograjp 9.3.9: Investigation of Potential Uses of Exploratory Data 
Analysis ] 

Modern data analysis entails^^hilosophical reorientation of sta- 
tistical practice. A scientific ideVl --formulate hypothesis, design and 
execute experiment, accept or reject n^othesis--is still honored, but 
the scientist is also encouraged to explore all available data looking 
for new hypotheses, tinusual phenomena, and re-expressions of information. 
Much emphasis is placed on graphic displays and other simple techniques 
which enable a data analyst to k^ow his data more intimately and can be 
used without the aid of the computer (Tukey, 1972). Another emphasis, 
one that takes maximum advantage of new computers, is on robust resistant 
methods which are useful in a wide variety of real-world situations where 
the usual statistical assumptions are questionable. 

Project 9.3.9.1 : Stem-and-Leaf 'pTots . An example of a simple data- 
analytic technique is the stem-and-ieaf plot (Tukey, 1970), which is a 
way of rearranging data to get the pictorial advantage of a histogram 
without the usual loss of information, The stem-and-leaf is about as 
easy to form as a histogram, and the computing of medians and qu^rtiles 
(hinges) and the identifying of outliers is then greatly facilitated. 

Project 9.3.9..2: Robust/Resistant Regression . Robust/resistant re- 
gressi on (Beaton & Tukey, 1974*) is an example of an attempt to avoid the 
emphasis on "fitting the unfittable" that is intrinsic to the least 
squares methods of squaring residuals before minimizing. It is easy to 
find or construct problems where 'least squares procedures fail to fill 
any points well, whereas estimation and/or smoothing approaches may fit 
the fittable very well while signalling, but not fitting, the outliers. 
Robust/resistant regressions fit almost as well as least squares in the 
ideal (Gaussian) case, and require only abou't 2 to 6 times as much com- 
puter time af classical regression (Miller, 1968; Hosteller & Tukey, 
196B; and Quenouill^, 1949). Other analysis methods, not specifically 
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discussed by Tukey but applicable to outliers or data points which do 
not seem to "fit" the model, are analyses using trimmed means (means 
which do not include the outlying points) or medians. 

Project 9.3.9.3; Jackknife Procedures . The jackknife procedure is 
another general -purpose tool for estimation and hypothesis testing which 
holds up well in-a variety of situations while losing little efficiency 
in ideal cases. In addition, the jackknife can be used in a number of 
situations in which other methods are unavailable or incomputable. The 
cost of the jackknife is fairly modest in typical situations. 

Program 9.3.10; Analyst's Models for the Estimation of Non-Additive 
" Effects of Teaching in Other than Factorial Designs 

In most experimental studies, the parameter of interest is one of 
location, i.e., whether or not groups differ with respect to their means. 
Sometimes, inequality of variances is also observed. Such inequality 
can indicate a non-additive model, e.g., Y = Oi X + O2, where X is a con^ 
trol value for a particular student ^nd Y is the experimental value for 
that student. The model specifies both additive and multiplicative ef- 
fects where 01 can be thought of as a learning rate parameter. This 
and other models for ncfn-additive effects may be useful for research on 
teaching (Lohnes, 1972). Jt should be noted that concern for non- 
additive effects is relate^ at least in part to concern for aptitude- 
treatment interactions. 



Program 9.3.11; Development of Statistical Decision Theory Models ' 
for Monitoring the Instructional Process 

Statistical decision theory has been found to have important appli- 
cation in business and economics and was introduced to education by 
Cronbach and Gleser in 1957. An advantage of 'decision theory (Novick, 
1971; Novick i Jackson, 1970; Pollack, 1968) is that it permits several 
aspects of the decision problem to be considered simultaneously in a 
coherent manner. Its drawbacks are the complexity of its mathematical 
formulation and the difficulty of "providing some of the Judgmental in- 
put required for its implementation. The first difficulty of decision ' 
theory (complexity of its mathematical formulation) has succumbed to 
repeated attack by a large number of able statisticians. Also, greater 
skill on the p,art of educational statisticians in formulating their 
problems in relatively simple, but realistic, ways has helped simplify 
decision theory. The second difficulty (input required for implemen- 
tation) is being reduced as interactive computer systems become 
available to help investigators quantify coherently their utilities 
and prior probabilities. 

Project 9.3.11.1; Monitoring Individualized Instruction Programs . 
One area in which decision theory is useful is that of monitoring in- 
dividualized instructional programs (Hambleton, 1973). In such pro- 
grams, decision points are continually appearing and a rational and 
coherent procedure for making the advance-return decisidlv-i^^equired. 
While some work has been done, much more is needed. Methodsfor 
choosing among various instructional modes are needed, as are metflods^ 
of combining serially-gathered data on individual students. 
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Project 9. 3.11.2: Decision-Theoretic Approach to PPoblems of Test 
The area of bias in selection, or culture-fair testing, is another 
in which a decision-theoretic formulation c?n have general applicability. 
While simple solutions are possible, much needs to be done to study the 
relationship between students* and institutions' utility structures and to 
ascertain how differences between these structures affect acceptability of 
selection and self-selection fairness. Also, much.work needs to be done 
with sophisticated utility structures and with multiple predictor and 
multiple outcome formulations. 
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APPROACH 9.4 



CONSIDER THE UTILITY OF STTANDARDS FOR 
IMPROVING METHODOLOGICAL PRACTICE 
IN RESEARCH ON TEACHING 



Two reasons, were suggested for attempting to develop methodologi- 
cal standards'^foi^ research on teaching. The first was that some re- 
search on teaching contains metjiodological flaws, many of which are 
common across .time and across studies. The second reason was that 
much research on teaching has^ not been cumulative. It is difficult, and 
^sometimes impossible, for .teachers or educational researchers to pool 
'^res^ults from studies dealing with common interest areas. » 

Setting methodological standards has been a fairly common practice, 
motivated by the hope that through the establishment of a set of minimal 
levels or standards of acceptable quality, the consumer will be pro- 
tected. Perhaps the most relevant example is the APA-AERA-NCME set of 
standards for test publishers.. Several groups are also considering the^ 
possiblity of standards for program evaluation. The concensus of the 
Panel' however, was that it is not possible nqr desirable to legislate 
through standards the methodological quality of research on teaching. ^ 

Researchers must take a creative approach to data anal/sis and 
be willing to use multiple strategies .in order to obtain the full' utility 
of their data. It seems likely that methodological standards for re- 
search on teaching would militate against such practices and, instead, 
promote rather routine and unthinking analyses. Further, research on, 
teaching has special yet varying methodological needs which a single 
set of standards could not begin to address. It was'c(ecided, therefore, 
to discourage the development of methodological standards. In place 
^ of standards, the Panel recommended several programs to facilitate com- 
munication of information about how to handle methodological problems 
that are of major concern in research on teaching. 
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This Approach is based on two majpr ideas: (a) tfje establishment of 
archival data that^can be used for secondary analysis and for illustra- 
tion of the results of alternative design and analysis {strategies; and 
(bT the establishment of procedures for disseminating the results of Ap- 
proaches 9.1, 9,2 and 9.3 to persons ejngaged in rjBsear^h on teaching. 
The goal is to encourage those doing research on teaching to use the 
"best known practices" in measurement, design, and data analysis. 

4 

Program 9.4. V: Secondary Analyses and Alternative Des^^gns 

It seems desirable to, commission competent educatjional research^ 
methodologists to review and critique past studies in the field of re- 
search on teaching. These reviews of past research shpuld contain a 
variety of secondary analyses and compare the utility pf those, strategies 
to the initial analysis. In addition, they should identify and de- 
scribe alternative design and analysis strategies for Addressing the 
research question that could not be illustrated througn secondary analy- 
sis. This ir\formation should be documented and made available for wide 
dissemination, particularly to the educational communiity interested in 
research on teaching. 

Program 9.4.2: Research Data Archive 

Professional journals have editorial polic1es*and 
greatly restrict the amount of. information and explora 
be of interest to other researchers. While it is not 
perhaps desirable) to attempt changes in existing publ 
tices, it is nevertheless true that some interested cohsumers of the 
literature could profit from more complete reports. Tie Panel suggests 
that an archive be created which would allow researchers to submit a 
more inclusive summary of their total research findings and their 
actual research data at some summary level. Two main :oncernsL are 
directly related to this. First, what kinds of resear:h results and 
summary data are mOst useful to archive? Second, wher^ should this 
archive be placed and how can it be made readily ^accessible to re- 
searchers of teaching? These archival data are directly related to 
facilitating Program 9*4,1 on Secondary Analyses and Alternative Designs. 

Program 9,4,3: Training Programs 

It was sugges^t^d that professional organizations such as the 
American Educationais Research Association be encouraged to sponsor 
methodological training sessions for researchers With bn-golng 
projects in research ortvteaching. These training sessions should 
be applied and project-based,, not theoretical in natur 
training suggestion was th^% fellowship programs be^cr 
cally for mid-career researchers. These would be stri 
that would bring into the university community persons 
involved in research on teaching. At the university^ 
researchers would be able to take. research methodologj^ courses and tap 
faculty ideas relating to their specific research,. 
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Program 9.4.4^ Providing the Methodologfcal Capacity to Support 
Research on Teaching , 

Ideally, a person conducting res'earch on teaching should have not 
only an interest in and understanding of the research issue, but also 
the methodological sophistication to identify and, where necessary, 
adapt methodology for his particular research needs. Unfortunately,, this 
is not always the case. The previous programs in Approach, 9.4 dealt with 
potentially long-run solutions to the problem through training. It would 
be helpful, however, if there were some short-run strategies. One pos- 
^sible strategy would be 'to make competent methodolpgists more readily 
'available to persons engaged in research on teaching, and to do so in a 
way that sustains their availability over the duration of a research pro- 
ject. This might be accomplished by partially supporting methodological 
specialists on the staffs of state departmeats of education, research and 
development centers, or laboratories— specialists who would have specific 
assignments to researcK projects on teaching. 

,< 

Program 9.4. S: Test Evaluation Manuals 

^ To guide selection from existing measurement strategies for re- 
search on teaching, it is suggested that test evaluation manuals be 
published for different areas of the teaching-learning process. The 
U.C.L.A. Center for the Study of Evaluation has completed several 
manuals on tests of student characteristics. Similar manuals could be 
devised stressing other areas of the teaching-learning process. With- 
in each area, ^uch as measuring teacher effectiveness, an extensive 
search should be made for relevant inftruments, both published and ex- 
perimental, \x\ order to ascertain the ^number and quality of instruments 
that have already been developed to assess variables in that particular 
area. 

Each test evaluation manual should give critical information about ^ 
the relevant instr^uments such as: ^ 

1. a summary qf"the purpose of the test - 

2. the type of instrument (i.e., interview schedule, self- 
report, etc.) ^ 

•» * - » 

3. evaluatipns of the quality of the instrument 

4. a sampling of actual test items. 
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TENTATIVE PRIORITY ESTIMATES 



•At the close of the'Conference, Panel members were asked to rate, 
each program (or each project, where such was specified for a particular 
program) on the basis of its judged importance to research on teaching, 
The.cnteria for judged importance 'were left to the discretion of the 
i^diyidual members, but clearly the ratings must be interpreted within 
the context of the Panel's concerns, i,e,, research methodology. Since 
the Panel was small, since it^ members were subject to shifts in set as 
they focused on specific problem areas, and since the ratings were done 
at the end of an , exhausting "set of sessions, they should not be over • 
interpreted. Nonetheless, they are presented here'a^ a stimulus to-the 
reader to make similar comparisons among programs. 

The ratings were made on a scale ranging from 1 (of little im- 
portance) to 3 (of great importance).. Table 1 shows the resulting 
order of programs within each of the four Approaches. Programs which 
were not rated by the Panel and which cannot therefore be located in 
the ordering ane nonetheless, included at the bottom of the listings. 
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Although much useful research on teaching , has been conducted, the 
utility of some of the research has been limited b^cause af methodological 
prdblenis. In some cases, appropriate methodology was not available; in^ r 
other cases:* established best practices, were not followed. In'addition, 
there have been cases where methodologies were borrowed from other research 
disciplines without a careful rethinking, of the assumptions. iovolved. Thtfs, 
the goal adopted by Panel ,9 was to improve the validity and utility of 
measurement, design, and analysis in research on teaching. Jo tha^ end, 
four Approaches were adopted covering the stimulation of new methodological, 
knowledge as well as identification and translation of useful methodological 
knowledge from other disciplines. ^ ^ 

The first Approach called for the development and testing ofvjiew design 
and analysis strategies. Perhaps the major impression left by reviews of 
current research cyi teaching is tTiat problems of design and analysis are 
encountered at many stages, and are solved, if at'^all, in an imitative or 
derivative fashion ^rawing on analogies with earlier studies, especially 
those in a^griculture. - The Panel felt that it' is time to put l^ort^ nipre 
systematic"" efforts toward developing principles for the design and analysis 
of studies within the special and possibly unique context of problems of 
education in general and the study of teaching in particular^ Solutions v 
are' needed for design and analysis problems such as cumulating results from 
distinct but related studdes, controlling the*influences of confbunding 
. vaHables, and studying longitudinal effects.- / 

The secon.d Approach called*for an increased understanding of existing, 
measurement strategies for research on teaching and where appropriate the 
development. of new measurement strategies. Much of tHe history 0/ measure- 
ment in the behavioral sciences reflates to concerns for measurihg"^ individual 
student aptitudes and achievement. Although this work has been and will 
continue to be 9f value to^r^search on teaching, there ar§ other important 
aspects of measurement. Greater attention should be given to problems of 
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test bias. Psychometric ^theory must be developed to support the 
criterion referenced test movement. Betl^ec measures of the 
teaching/learning process are required, paracularly in natural 
settings. Yet, another example is the need for group assessment 
measures as contrasted with measur'ias designed to assess individual . 
differences. . 

Current and pending legislation has given a sense of urgency 
to the, solution of these design, arva'Ucsis, and measurement problems. 
Thirty-one states are now considering laws requiring all applicants 
for a teaching license to demonstrate their teaching effeprtiveness. 
One example is the Stull Act, 1972, of California, whichfrequires all 
school districts to evaluate their teachers. 

The first two Approaches reflected a. concern for the development 
of new design, analysis, and measurement techniques which serve the 
unique needs of research on teaching. Most of the methodology 
currently used in research on teaching, however, was originally de- 
veloped in other research disciplines. There are at least two 
reasons why continued identification, translation, and dissemination 
of methodologies from other research disciplines (Approach 3) seems 
warranted. First, In many cases, these borrowed methodologies have 
served the researchers of teaching quite well. Second, where existing 
useful methodologies are available, duplication of development sholild 
be avoided. Several potentially useful methodologies were^identified. 

The fourth Approach considered the utility of setting standards 
of methodological practice within research on teaching. The con- 
sensus of the Panel was that it is neither desirable nor possible to 
legislate (through standards) the methodological quality of research 
on teaching. Researchers must take a creative /approach' to data 
analysis and be willing to use multiple strategies in order to obtain 
full utility of their data. It seems likely ihat methodological 
standards for research on teaching would mil/itate against such 
practices ?nd, instead, promote rather routine and unthinking analyses. 
Further, researuh on teaching has special vet varying methodological 
needs which one set of standards could noy begin to address. It was 
decided, therefore, to discourage the dev^elopment of methodological 
standards. In place of standards, the Pdnel recommended several 
^^^'^-^^rograms to facilitate comnunication of /information about how to 
handle methodological problems that are'of major concern in research 
on teaching. j 
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